ICE: Idiom and Collocation Extractor for Research and Education

نویسندگان

  • Vasanthi Vuppuluri
  • Shahryar Baki
  • An Thai Nguyen
  • Rakesh M. Verma
چکیده

Collocation and idiom extraction are wellknown challenges with many potential applications in Natural Language Processing (NLP). Our experimental, open-source software system, called ICE, is a python package for flexibly extracting collocations and idioms, currently in English. It also has a competitive POS tagger that can be used alone or as part of collocation/idiom extraction. ICE is available free of cost for research and educational uses in two user-friendly formats. This paper gives an overview of ICE and its performance, and briefly describes the research underlying the extraction algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tools for Collocation Extraction: Preferences for Active vs. Passive

We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especially for the active vs. passive voice. We start from tokenized, tagged, lemmatized and chunked text, and we use extraction patterns formulated in the CQP corpus query language. We discuss the results of a precision eval...

متن کامل

The Performance of Iranian EFL Learners in Producing and Recognizing Idiom-Containing Sentences

This study aimed to investigate how Iranian EFL learners performed in producing sentences containing idioms and whether they had any problems in producing such sentences. This query, subsequently, raised the question of whether idioms influenced the participants’ grammaticality judgment on idiom-containing sentences. For this purpose, firstly, the writings of 24 learners were investigated for a...

متن کامل

Bilingual Collocation Extraction Based on Syntactic and Statistical Analyses

In this paper, we describe an algorithm that employs syntactic and statistical analysis to extract bilingual collocations from a parallel corpus. The preferred syntactic patterns are obtained from idioms and collocations in a machine-readable dictionary. Phrases matching the patterns are extract from aligned sentences in a parallel corpus. Those phrases are subsequently matched up via cross-lin...

متن کامل

The Verb in the Terminological Collocations. Contribution to the Development of a Morphological Analyser: MorphoCom

Considering that we are observing and describing the behaviour of the terminological units and the terminological collocations, we intend to talk about the value of the verb as a nuclear element of the terminological collocation in the Portuguese language. So we will empathize the theoretical distinction between multilexemic terminological unit and terminological collocation and the importance ...

متن کامل

The Comparative Effect of Using Idioms in Conversation and Paragraph Writing on EFL Learners’ Idiom Learning

This study investigated the comparative effect of teaching idiomatic expressions through practicing them in conversation and paragraph writing on intermediate EFL learners’ idiom learning. The participants were sorted out of a population of 134 intermediate students in Zabansara Language School in Khorramabad based on their scores on a Preliminary English Test (PET) and an idiom test piloted in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017